The TigerSHARC DSP Architecture

نویسندگان

  • José Fridman
  • Zvi Greenfield
چکیده

In the past two years, several multiple data path and pipelined digital signal processors have been introduced to the marketplace. This new generation of DSPs takes advantage of higher levels of integration than were available for their predecessors. It also incorporates multiple execution units on a single core as well as deep execution pipelines. For an introduction to recent trends in DSPs see Eyer and Bier, and for comprehensive analysis on DSP chips see the DSP buyer’s guide and Levy. Here, we describe a new parallel DSP architecture called TigerSHARC. We focus on the computational aspects of its core and onchip memory architecture. To sustain the high computation rates of cores with multiple execution units, memory subsystems must scale proportionately. We based our solution to the high-bandwidth demands of this parallel DSP core on a memory architecture characterized by what we call short-vector processor techniques. These techniques are essentially smallwidth vector processor interfaces. In addition to the architectural description, we also present an application example of a finite-length impulse response, or FIR, filter. We use this example to illustrate a technique used to map this class of algorithms to a parallel, vector-oriented processor. The FIR filter is a representative member of a large class of DSP algorithms, namely any structure with delay lines such as infinite-length impulse response, or IIR, structures, equalizers, and multirate filters, all of which share similar solutions. (Two-dimensional extensions of these algorithms, such as 2D filtering and convolution used in imaging, can also be solved using extensions to the techniques presented here.) To efficiently map this class of algorithms to this parallel DSP, we must address two related problems: the distribution of computation among several execution units, and the provision of adequate alignment between data and filter coefficients. To map the delay line structure of the FIR, we apply an algorithmic transformation to the algorithm, and, as a result, expose its parallelism in a form suited to the target architecture. This algorithmic transformation produces a high efficiency implementation by relying only on aligned short-vector memory accesses. This example also shows that the conventional single-instruction, multiple-data (SIMD) dispatch mechanism, although very effective in simple linear algebra and matrix operations, may be overly restrictive when applied to this class of DSP algorithms. And, as a result, non-SIMD execution is required to achieve high efficiency. Jose Fridman Zvi Greenfield

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Modern DSP Architectures

In this seminar contribution I’m going to introduce modern DSP architectures. After giving a short overview on the history of Digital Signal Processing, I will then discuss the differences between Digital Signal Processing and general purpose computing. These differences impose implications on the architecture of DSPs that I am going do discuss shortly. The main part will introduce the TigerSha...

متن کامل

Improving DSP Performance with a Small Amount of Field Programmable Logic

We show a systematic methodology to create DSP + fieldprogrammable logic hybrid architectures by viewing it as a hardware/software codesign problem. This enables an embedded processor architect to evaluate the trade-offs in the increase in die area due to the field programmable logic and the resultant improvement in performance or code size. We demonstrate our methodology with the implementatio...

متن کامل

NeuroMatrix® NM6403 DSP with Vector/Matrix engine

The paper describes the architecture of the NeuroMatrix® NM6403 DSP designed for image processing, signal processing and neural networks emulation [1,2]. The paper includes a brief description of the processor structure and its instruction set. The NM6403 is the first DSP based on NeuroMatrix® Core (NMC) comprises an original 32-bit VLIW RISC processor and a 64-bit SIMD Vector co-processor (VCP...

متن کامل

Performance Analysis of a Chaos-Based Multi-User Communication System Implemented in DSP Technology

This paper presents the implementation of a multi-user chaos-based communication system in DSP. The system is based on the chaotic phase shift keying (CPSK) digital modulation scheme, where chaotic signals are used as the spreading sequences of a CDMA system. Using chaotic signals offers the advantages of increased security and higher system capacity compared with conventional sequences. The ai...

متن کامل

Using Genetic Programming for Source-Level Data Assignment to Dual Memory Banks

Due to their streaming nature, memory bandwidth is critical for most digital signal processing applications. To accommodate these bandwidth requirements digital signal processors are typically equipped with dual memory banks that enable simultaneous access to two operands if the data is partitioned appropriately. Fully automated and compiler integrated approaches to data partitioning and memory...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IEEE Micro

دوره 20  شماره 

صفحات  -

تاریخ انتشار 2000